45 research outputs found

    Translating nucleic acid binding protein function from model species to minor crops using transfer learning

    Get PDF
    Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops

    The barley pan-genome reveals the hidden legacy of mutation breeding

    Get PDF
    Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop species (known as the ‘pan-genome’1). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions2. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley—comprising landraces, cultivars and a wild barley—that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding

    Identification of gene modules associated with low temperatures response in Bambara groundnut by network-based analysis

    Get PDF
    Bambara groundnut (Vigna subterranea (L.) Verdc.) is an African legume and is a promising underutilized crop with good seed nutritional values. Low temperature stress in a number of African countries at night, such as Botswana, can effect the growth and development of bambara groundnut, leading to losses in potential crop yield. Therefore, in this study we developed a computational pipeline to identify and analyze the genes and gene modules associated with low temperature stress responses in bambara groundnut using the cross-species microarray technique (as bambara groundnut has no microarray chip) coupled with network-based analysis. Analyses of the bambara groundnut transcriptome using cross-species gene expression data resulted in the identification of 375 and 659 differentially expressed genes (p<0.01) under the sub-optimal (23°C) and very sub-optimal (18°C) temperatures, respectively, of which 110 genes are commonly shared between the two stress conditions. The construction of a Highest Reciprocal Rank-based gene co-expression network, followed by its partition using a Heuristic Cluster Chiseling Algorithm resulted in 6 and 7 gene modules in sub-optimal and very sub-optimal temperature stresses being identified, respectively. Modules of sub-optimal temperature stress are principally enriched with carbohydrate and lipid metabolic processes, while most of the modules of very sub-optimal temperature stress are significantly enriched with responses to stimuli and various metabolic processes. Several transcription factors (from MYB, NAC, WRKY, WHIRLY & GATA classes) that may regulate the downstream genes involved in response to stimulus in order for the plant to withstand very sub-optimal temperature stress were highlighted. The identified gene modules could be useful in breeding for low-temperature stress tolerant bambara groundnut varieties

    Translating nucleic acid binding protein function from model species to minor crops using transfer learning

    No full text
    Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops

    Unraveling 14-3-3 proteins in C4 panicoids with emphasis on model plant Setaria italica reveals phosphorylation-dependent subcellular localization of RS splicing factor.

    No full text
    14-3-3 proteins are a large multigenic family of regulatory proteins ubiquitously found in eukaryotes. In plants, 14-3-3 proteins are reported to play significant role in both development and response to stress stimuli. Therefore, considering their importance, genome-wide analyses have been performed in many plants including Arabidopsis, rice and soybean. But, till date, no comprehensive investigation has been conducted in any C4 panicoid crops. In view of this, the present study was performed to identify 8, 5 and 26 potential 14-3-3 gene family members in foxtail millet (Si14-3-3), sorghum (Sb14-3-3) and maize (Zm14-3-3), respectively. In silico characterization revealed large variations in their gene structures; segmental and tandem duplications have played a major role in expansion of these genes in foxtail millet and maize. Gene ontology annotation showed the participation of 14-3-3 proteins in diverse biological processes and molecular functions, and in silico expression profiling indicated their higher expression in all the investigated tissues. Comparative mapping was performed to derive the orthologous relationships between 14-3-3 genes of foxtail millet and other Poaceae members, which showed a higher, as well as similar percentage of orthology among these crops. Expression profiling of Si14-3-3 genes during different time-points of abiotic stress and hormonal treatments showed a differential expression pattern of these genes, and sub-cellular localization studies revealed the site of action of Si14-3-3 proteins within the cells. Further downstream characterization indicated the interaction of Si14-3-3 with a nucleocytoplasmic shuttling phosphoprotein (SiRSZ21A) in a phosphorylation-dependent manner, and this demonstrates that Si14-3-3 might regulate the splicing events by binding with phosphorylated SiRSZ21A. Taken together, the present study is a comprehensive analysis of 14-3-3 gene family members in foxtail millet, sorghum and maize, which provides interesting information on their gene structure, protein domains, phylogenetic and evolutionary relationships, and expression patterns during abiotic stresses and hormonal treatments, which could be useful in choosing candidate members for further functional characterization. In addition, demonstration of interaction between Si14-3-3 and SiRSZ21A provides novel clues on the involvement of 14-3-3 proteins in the splicing events

    Identification and Molecular Characterization of MYB Transcription Factor Superfamily in C4 Model Plant Foxtail Millet (Setaria italica L.)

    No full text
    MYB proteins represent one of the largest transcription factor families in plants, playing important roles in diverse developmental and stress-responsive processes. Considering its significance, several genome-wide analyses have been conducted in almost all land plants except foxtail millet. Foxtail millet (Setaria italica L.) is a model crop for investigating systems biology of millets and bioenergy grasses. Further, the crop is also known for its potential abiotic stress-tolerance. In this context, a comprehensive genome-wide survey was conducted and 209 MYB protein-encoding genes were identified in foxtail millet. All 209 S. italica MYB (SiMYB) genes were physically mapped onto nine chromosomes of foxtail millet. Gene duplication study showed that segmental- and tandem-duplication have occurred in genome resulting in expansion of this gene family. The protein domain investigation classified SiMYB proteins into three classes according to number of MYB repeats present. The phylogenetic analysis categorized SiMYBs into ten groups (I-X). SiMYB-based comparative mapping revealed a maximum orthology between foxtail millet and sorghum, followed by maize, rice and Brachypodium. Heat map analysis showed tissue-specific expression pattern of predominant SiMYB genes. Expression profiling of candidate MYB genes against abiotic stresses and hormone treatments using qRT-PCR revealed specific and/or overlapping expression patterns of SiMYBs. Taken together, the present study provides a foundation for evolutionary and functional characterization of MYB TFs in foxtail millet to dissect their functions in response to environmental stimuli

    Genome-wide development of transposable elements-based markers in foxtail millet and construction of an integrated database

    No full text
    Transposable elements (TEs) are major components of plant genome and are reported to play significant roles in functional genome diversity and phenotypic variations. Several TEs are highly polymorphic for insert location in the genome and this facilitates development of TE-based markers for various genotyping purposes. Considering this, a genome-wide analysis was performed in the model plant foxtail millet. A total of 30,706 TEs were identified and classified as DNA transposons (24,386), full-length Copia type (1,038), partial or solo Copia type (10,118), full-length Gypsy type (1,570), partial or solo Gypsy type (23,293) and Long- and Short-Interspersed Nuclear Elements (3,659 and 53, respectively). Further, 20,278 TE-based markers were developed, namely Retrotransposon-Based Insertion Polymorphisms (4,801, ∼24%), Inter-Retrotransposon Amplified Polymorphisms (3,239, ∼16%), Repeat Junction Markers (4,451, ∼22%), Repeat Junction-Junction Markers (329, ∼2%), Insertion-Site-Based Polymorphisms (7,401, ∼36%) and Retrotransposon-Microsatellite Amplified Polymorphisms (57, 0.2%). A total of 134 Repeat Junction Markers were screened in 96 accessions of Setaria italica and 3 wild Setaria accessions of which 30 showed polymorphism. Moreover, an open access database for these developed resources was constructed (Foxtail millet Transposable Elements-based Marker Database; http://59.163.192.83/ltrdb/index.html). Taken together, this study would serve as a valuable resource for large-scale genotyping applications in foxtail millet and related grass species

    Chromosomal distribution and duplication of <i>14-3-3</i> genes.

    No full text
    <p><b>a</b> Maize. <b>b</b> Sorghum. <b>c</b> Foxtail millet. Vertical bars represent chromosomes and the values at left indicate the position of genes in megabase. Genes are marked in the right and Zm, Sb and Si denotes <i>Zea mays</i>, <i>Sorghum bicolor</i> and <i>Setaria italica</i>, respectively. Lines denote segmental duplication, whereas tandemly duplicated genes are highlighted in boxes.</p

    Si14-3-3 and mutant SiRSZ21A (<i>m2</i> and <i>m3</i>) co-expressed in onion epidermal cells.

    No full text
    <p>Localization of m2 in nucleus is evidenced. <b>A</b> whole cell, <b>B</b> magnified view of nucleus.</p
    corecore